Spotting Fake Reviews using Positive-Unlabeled Learning
نویسندگان
چکیده
Fake review detection has been studied by researchers for several years. However, so far all reported studies are based on English reviews. This paper reports a study of detecting fake reviews in Chinese. Our review dataset is from the Chinese review hosting site Dianping, which has built a fake review detection system. They are confident that their algorithm has a very high precision, but they don’t know the recall. This means that all fake reviews detected by the system are almost certainly fake but the remaining reviews may not be all genuine. This paper first reports a supervised learning study of two classes, fake and unknown. However, since the unknown set may contain many fake reviews, it is more appropriate to treat it as an unlabeled set. This calls for the model of learning from positive and unlabeled examples (or PU-learning). Experimental results show that PU learning not only outperforms supervised learning significantly, but also detects a large number of potentially fake reviews hidden in the unlabeled set that Dianping fails to detect.
منابع مشابه
Fake Review Detection: Classification and Analysis of Real and Pseudo Reviews
In recent years, fake review detection has attracted significant attention from both businesses and the research community. For reviews to reflect genuine user experiences and opinions, detecting fake reviews is an important problem. Supervised learning has been one of the main approaches for solving the problem. However, obtaining labeled fake reviews for training is difficult because it is ve...
متن کاملWhat Yelp Fake Review Filter Might Be Doing?
Online reviews have become a valuable resource for decision making. However, its usefulness brings forth a curse ‒ deceptive opinion spam. In recent years, fake review detection has attracted significant attention. However, most review sites still do not publicly filter fake reviews. Yelp is an exception which has been filtering reviews over the past few years. However, Yelp’s algorithm is trad...
متن کاملKeyword spotting for self-training of BLSTM NN based handwriting recognition systems
The automatic transcription of unconstrained continuous handwritten text requires well trained recognition systems. The semi-supervised paradigm introduces the concept of not only using labeled data but also unlabeled data in the learning process. Unlabeled data can be gathered at little or not cost. Hence it has the potential to reduce the need for labeling training data, a tedious and costly ...
متن کاملPositive Unlabeled Learning for Deceptive Reviews Detection
Deceptive reviews detection has attracted significant attention from both business and research communities. However, due to the difficulty of human labeling needed for supervised learning, the problem remains to be highly challenging. This paper proposed a novel angle to the problem by modeling PU (positive unlabeled) learning. A semi-supervised model, called mixing population and individual p...
متن کاملModeling Consumer Learning from Online Product Reviews
W propose a structural model to study the effect of online product reviews on consumer purchases of experiential products. Such purchases are characterized by limited repeat purchase behavior of the same product item (such as a book title) but significant past usage experience with other products of the same type (such as books of the same genre). To cope with the uncertainty in quality of the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computación y Sistemas
دوره 18 شماره
صفحات -
تاریخ انتشار 2014